Goto

Collaborating Authors

 Web


WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Neural Information Processing Systems

Autonomous UI agents powered by AI have tremendous potential to boost human productivity by automating routine tasks such as filing taxes and paying bills. However, a major challenge in unlocking their full potential is security, which is exacerbated by the agent's ability to take action on their user's behalf. Existing tests for prompt injections in web agents either over-simplify the threat by testing unrealistic scenarios or giving the attacker too much power, or look at single-step isolated tasks. To more accurately measure progress for secure web agents, we introduce WASP--a new publicly available benchmark for end-to-end evaluation of Web Agent Security against Prompt injection attacks. Evaluating with WASP shows that even top-tier AI models, including those with advanced reasoning capabilities, can be deceived by simple, low-effort human-written injections in very realistic scenarios. Our end-to-end evaluation reveals a previously unobserved insight: while attacks partially succeed in up to 86% of the case, even state-ofthe-art agents often struggle to fully complete the attacker goals--highlighting the current state of security by incompetence.


WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Neural Information Processing Systems

Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities. However, their reliance on static internal knowledge limits their performance on complex, knowledge-intensive tasks and hinders their ability to produce comprehensive research reports requiring synthesis of diverse web information. To address this, we propose WebThinker, a deep research agent that empowers LRMs to autonomously search the web, navigate among web pages, and draft reports during the reasoning process. WebThinker integrates a Deep Web Explorer module, enabling LRMs to dynamically search, navigate, and extract information from the web when encountering knowledge gaps. It also employs an Autonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time. To further enhance research tool utilization, we introduce an RL-based training strategy via iterative online Direct Preference Optimization (DPO). Extensive experiments on complex reasoning benchmarks (GPQA, GAIA, WebWalkerQA, HLE) and scientific report generation tasks (Glaive) demonstrate that WebThinker significantly outperforms existing methods and strong proprietary systems. Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.


Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Neural Information Processing Systems

AI agents today are mostly siloed -- they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action -- but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce \textsc{Embodied Web Agents}, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning. To operationalize this concept, we first develop the \textsc{Embodied Web Agents} task environments, a unified simulation platform that integrates realistic 3D indoor and outdoor environments with functional web interfaces. Building upon this platform, we construct and release the \textsc{Embodied Web Agents} Benchmark, which encompasses a diverse suite of tasks including cooking, navigation, shopping, tourism, and geolocation -- all requiring coordinated reasoning across physical and digital realms for systematic assessment of cross-domain intelligence. Experimental results reveal significant performance gaps between state-of-the-art AI systems and human capabilities, establishing both challenges and opportunities at the intersection of embodied cognition and web-scale knowledge access.


How to Disable Google's Gemini in Chrome

WIRED

Chrome users were caught off guard by a 4-GB Google AI model baked into Chrome, sparking privacy concerns. You might not want to. If you use Google's Chrome browser for desktop, there's probably a Gemini Nano AI model running on your computer right now and taking up about 4 GB of space. That's not necessarily a bad thing, but if you didn't know about it and don't want it, there's a way to turn it off. The file started auto-downloading for Chrome users in 2024 after Google built Gemini Nano into the browser.


15 free apps that unlock the best version of your Chromebook

PCWorld

PCWorld highlights 15 essential free apps that can significantly enhance Chromebook functionality, covering VPNs, photo editing, and Android applications. Key recommendations include Proton VPN for unlimited data and privacy protection, plus photo editing apps like Snapseed, Pixlr, and Photoshop Express. These apps help Chromebook users access geo-restricted content, edit photos professionally, and maximize their device's potential as a viable alternative to traditional computers. Macs and PCs are no longer the only options for those looking to buy a computer. One of the fastest-growing categories is the Chromebook, Google's own line of devices. Initially, Chromebooks were low-cost machines largely limited to the Chrome browser. In recent years, more premium Chromebook models with greater processing power have entered the market.


Chrome silently downloads a 4GB AI model. Here's how to remove it

PCWorld

PCWorld discovered that Google Chrome silently downloads a 4GB AI model called Gemini Nano to users' computers without explicit consent. This AI model provides local features like text summarization and scam warnings, but consumes significant storage space on devices. Users can permanently remove the file by disabling "On-device AI" in Chrome's system settings. Detailed instructions are provided below. Google's Chrome browser is already a notorious storage hog, but now comes word that it's crowding our PC drives in a new way: with a local AI model. That model, spotted by That Privacy Guy, gets silently downloaded to your PC or Mac upon installing Chrome, and it gobbles up a whopping 4GB of storage space. Spoiler alert: Yes, you can remove the file, and I'm going to show you how.


ADataset for Analyzing Streaming Media Performance over HTTP/3 Browsers

Neural Information Processing Systems

HTTP/3 is a new application layer protocol supported by most browsers. It uses QUIC as an underlying transport protocol. QUIC provides multiple benefits, like faster connection establishment, reduced latency, and improved connection migration. Hence, popular browsers like Chrome/Chromium, Microsoft Edge, Apple Safari, and Mozilla Firefox have started supporting it. This paper presents an HTTP/3-supported browser dataset collection tool named H3B.


Amazon's cloud 'hit by two outages caused by AI tools last year'

The Guardian

A technician works at an Amazon Web Services AI datacentre in New Carlisle, Indiana. A technician works at an Amazon Web Services AI datacentre in New Carlisle, Indiana. Amazon's cloud'hit by two outages caused by AI tools last year' Reported issues at Amazon Web Services raise questions about firm's use of artificial intelligence as it cuts staff Amazon's huge cloud computing arm reportedly experienced at least two outages caused by its own artificial intelligence tools, raising questions about the company's embrace of AI as it lays off human employees. A 13-hour interruption to Amazon Web Services' (AWS) operations in December was caused by an AI agent autonomously choosing to "delete and then recreate" a part of its environment, the Financial Times reported. AWS, which provides vital infrastructure for much of the internet, suffered several outages last year.